Many alternatives to classical frequentist statistics and hypothesis testing in particular have been proposed
Bayesian statistics
Lowering significance threshold from \(0.05\) to \(0.005\)
Focus on effect sizes and uncertainty intervals
My opinion:\(P\)-values are here to stay
Improve interpretation of tools that we have
\(P\)-value functions: Graphical tool to achieve that
\(P\)-values
Definition
\(P\) is the probability under the null hypothesis of obtaining a test statistic at least as extreme as the one obtained. “Extreme” means farther from the null value.
Informally, \(p\)-values are a continuous measure of compatibility between the data and a hypothesis, given a set of background information (for details, see here).
Confidence intervals and their relation with \(p\)-values
“Our results are most compatible at the 95% level with an effect of high versus low amounts of sitting anywhere from an 8% hazard reduction to 55% increase in the hazard of diabetes.”
Allows the creation of \(p\)-value functions directly from models using the function p_function. Here is an example from a linear regression model:
Code
mod <-lm(mpg ~ wt +as.factor(gear) + am, data = mtcars)p_curve <-p_function(mod, ci_levels =c(emph =0.95))plot(p_curve, n_columns =2)
Take-Home-Messages
The practice of dichotomizing results into “significant” and “not significant” is not informative and should probably stop.
Focus on effect sizes and confidence/compatibility intervals.
Create \(p\)-value functions to summarize the available evidence:
Values with high and low compatibility with the data.
To compare evidence from different studies.
Never say that “we found no effect/association” or “there was no difference” when \(p>0.05\)(Greenland et al. 2016)
Recommended reading
Bender, Ralf, Gabriele Berg, and Hajo Zeeb. 2005. “Tutorial: UsingConfidenceCurves in MedicalResearch.”Biometrical Journal 47 (2): 237–47. https://doi.org/10.1002/bimj.200410104.
Greenland, Sander. 2019a. “Valid P-ValuesBehaveExactly as TheyShould: SomeMisleadingCriticisms of P-Values and TheirResolutionWithS-Values.”The American Statistician 73 (sup1): 106–14. https://doi.org/10.1080/00031305.2018.1529625.
———. 2019b. “Valid P-ValuesBehaveExactly as TheyShould: SomeMisleadingCriticisms of P-Values and TheirResolutionWithS-Values.”The American Statistician 73 (sup1): 106–14. https://doi.org/10.1080/00031305.2018.1529625.
Greenland, Sander, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman. 2016. “Statistical Tests, P Values, Confidence Intervals, and Power: A Guide to Misinterpretations.”European Journal of Epidemiology 31 (4): 337–50. https://doi.org/10.1007/s10654-016-0149-3.
Infanger, Denis, and Arno Schmidt-Trucksäss. 2019. “P Value Functions: An Underused Method to Present Research Results and to Promote Quantitative Reasoning.”Statistics in Medicine 38 (21): 4189–97. https://doi.org/10.1002/sim.8293.
Rafi, Zad, and Sander Greenland. 2020. “Semantic and Cognitive Tools to Aid Statistical Science: Replace Confidence and Significance by Compatibility and Surprise.”BMC Medical Research Methodology 20 (1): 244. https://doi.org/10.1186/s12874-020-01105-9.